NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Block diffusion: Interpolating between autoregressive and diffusion language models

Arriola, Marianne; Gokaslan, Aaron; Chiu, Justin; Yang, Zhihan; Qi, Zhixuan; Han, Jiaqi; Sahoo, Subham; Kuleshov, Volodymyr (April 2025, ICLR)

Full Text Available
Diffusion Models With Learned Adaptive Noise

Sahoo, Subham; Gokaslan, Aaron; De_Sa, Christopher; Kuleshov, Volodymyr (December 2024, NeurIPS 2024)

Diffusion models have gained traction as powerful algorithms for synthesizing high-quality images. Central to these algorithms is the diffusion process, a set of equations which maps data to noise in a way that can significantly affect performance. In this paper, we explore whether the diffusionprocess can be learned from data.Our work is grounded in Bayesian inference and seeks to improve log-likelihood estimation by casting the learned diffusion process as an approximate variational posterior that yields a tighter lower bound (ELBO) on the likelihood.A widely held assumption is that the ELBO is invariant to the noise process: our work dispels this assumption and proposes multivariate learned adaptive noise (MuLAN), a learned diffusion process that applies noise at different rates across an image. Our method consists of three components: a multivariate noise schedule, adaptive input-conditional diffusion, and auxiliary variables; these components ensure that the ELBO is no longer invariant to the choice of the noise schedule as in previous works. Empirically, MuLAN sets a new state-of-the-art in density estimation on CIFAR-10 and ImageNet while matching the performance of previous state-of-the-art models with 50% fewer steps. We provide the code, along with a blog post and video tutorial on the project page: https://s-sahoo.com/MuLAN
more » « less
Full Text Available
The GAN is dead; long live the GAN! A Modern Baseline GAN

Huang, Nick; Gokaslan, Aaron; Kuleshov, Volodymyr; Tompkin, James (December 2024, Neurips)

Full Text Available
Cross-species modeling of plant genomes at single-nucleotide resolution using a pretrained DNA language model

https://doi.org/10.1073/pnas.2421738122

Zhai, Jingjing; Gokaslan, Aaron; Schiff, Yair; Berthel, Ana; Liu, Zong-Yan; Lai, Wei-Yun; Miller, Zachary R; Scheben, Armin; Stitzer, Michelle C; Romay, M Cinta; et al (June 2025, Proceedings of the National Academy of Sciences)

Interpreting function and fitness effects in diverse plant genomes requires transferable models. Language models (LMs) pretrained on large-scale biological sequences can capture evolutionary conservation and offer cross-species prediction better than supervised models through fine-tuning limited labeled data. We introduce PlantCaduceus, a plant DNA LM that learns evolutionary conservation patterns in 16 angiosperm genomes by modeling both DNA strands simultaneously. When fine-tuned on a small set of labeledArabidopsisdata for tasks such as predicting translation initiation/termination sites and splice donor/acceptor sites, PlantCaduceus demonstrated remarkable transferability to maize, which diverged 160 Mya. The model outperformed the best existing DNA language model by 1.45-fold in maize splice donor prediction and 7.23-fold in maize translation initiation site prediction. In variant effect prediction, PlantCaduceus showed performance comparative to state-of-the-art protein LMs. Mutations predicted to be deleterious by PlantCaduceus showed threefold lower average minor allele frequencies compared to those identified by multiple sequence alignment-based methods. Additionally, PlantCaduceus successfully identifies well-known causal variants in bothArabidopsisand maize. Overall, PlantCaduceus is a versatile DNA LM that can accelerate plant genomics and crop breeding applications.
more » « less
Full Text Available
Simple and Effective Masked Diffusion Language Models

Sahoo, Subham; Arriola, Marianne; Schiff, Yair; Gokaslan, Aaron; Marroquin, Edgar; Chiu, Justin; Rush, Alexander; Kuleshov, Volodymyr (December 2024, Neurips)

Full Text Available
Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling

Schiff, Yair; Kao, Chia-Hsiang; Gokaslan, Aaron; Dao, Tri; Gu, Albert; Kuleshov, Volodymyr (July 2024, International Conference on Machine Learning)

Full Text Available
ADVANCING DNA LANGUAGE MODELS: THE GENOMICS LONG-RANGE BENCHMARK

Kao, Chia_Hsiang; Trop, Evan; Polen, McKinley; Schiff, Yair; P_de_Almeida, Bernardo; Gokaslan, Aaron; PIERROT, Thomas; Kuleshov, Volodymyr (May 2024, ICLR 2024 Workshop on Machine Learning for Genomics Explorations)
InfoDiffusion: Representation Learning Using Information Maximizing Diffusion Models

Wang, Yingheng; Schiff, Yair; Gokaslan, Aaron; Pan, Weishen; Wang, Fei; De_Sa, Christopher; Kuleshov, Volodymyr (July 2023, International Conference on Machine Learning)
DataComp-LM: In search of the next generation of training sets for language models

Li, Jeffrey; Fang, Alex; Smyrnis, Georgios; Ivgi, Maor; Jordan, Matt; Gadre, Samir; Bansal, Hritik; Guha, Etash; Keh, Sedrick; Arora, Kushal; et al (April 2025, https://doi.org/10.48550/arXiv.2406.11794)

The authors introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments aimed at improving language models. DCLM provides a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations. Participants can experiment with dataset curation strategies such as deduplication, filtering, and data mixing at model scales ranging from 412M to 7B parameters. As a baseline, the authors find that model-based filtering is critical for assembling a high-quality training set. Their resulting dataset, DCLM-Baseline, enables training a 7B parameter model from scratch to achieve 64% 5-shot accuracy on MMLU with 2.6T training tokens. This represents a 6.6 percentage point improvement over MAP-Neo (the previous state-of-the-art in open-data LMs), while using 40% less compute. The baseline model is also comparable to Mistral-7B-v0.3 and Llama 3 8B on MMLU (63% and 66%), and performs similarly on an average of 53 NLU tasks, while using 6.6x less compute than Llama 3 8B. These findings emphasize the importance of dataset design for training LMs and establish a foundation for further research on data curation.
more » « less
Full Text Available
TöRF: Time-of-Flight Radiance Fields for Dynamic Scene View Synthesis

Attal, Benjamin; Laidlaw, Eliot; Gokaslan, Aaron; Kim, Changil; Richardt, Christian; Tompkin, James; O'Toole, Matthew (January 2021, Advances in neural information processing systems)

Neural networks can represent and accurately reconstruct radiance fields for static 3D scenes (e.g., NeRF). Several works extend these to dynamic scenes captured with monocular video, with promising performance. However, the monocular setting is known to be an under-constrained problem, and so methods rely on data-driven priors for reconstructing dynamic content. We replace these priors with measurements from a time-of-flight (ToF) camera, and introduce a neural representation based on an image formation model for continuous-wave ToF cameras. Instead of working with processed depth maps, we model the raw ToF sensor measurements to improve reconstruction quality and avoid issues with low reflectance regions, multi-path interference, and a sensor's limited unambiguous depth range. We show that this approach improves robustness of dynamic scene reconstruction to erroneous calibration and large motions, and discuss the benefits and limitations of integrating RGB+ToF sensors now available on modern smartphones.
more » « less
Full Text Available

« Prev Next »

Search for: All records